Three Layers — Tool, MCP, Skill
Agent systems need three different kinds of capability surfaces: atomic actions, externally delivered primitives, and workflow instructions. In Codex terms, MCP servers can expose tools, resources, and prompts; skills package reusable workflow guidance; and plugins can bundle skills, MCP servers, apps, and assets for distribution (MCP specification, Codex skills docs, Codex plugins docs). The useful mental model is not "which one is better?" It is "which layer should own this kind of change?"
The Stack
| Layer | What it owns | Change cost | Best use |
|---|---|---|---|
| Tool call | One atomic action with a schema | High if built into the agent, medium if external | Deterministic work such as search, file IO, database reads, API calls |
| MCP primitives | Tools, resources, or prompts exposed by an external MCP server | Medium | Shared integrations, team-owned services, third-party capabilities |
| Skill | Written workflow instructions the model reads | Low | Multi-step procedures, conventions, reviews, writing workflows |
| Plugin | Packaging and distribution | Medium | Bundling skills, MCP servers, and apps into an installable unit |
I use "tool call" for the generic agent-systems abstraction. OpenAI's function-calling guide describes application-defined function tools with schemas, while Codex docs define MCP servers, skills, and plugins as separate extension surfaces (OpenAI function calling guide, Codex glossary).
Tool Call
A tool call is a typed action. The model supplies arguments, the runtime executes code, and the result returns as structured data. The value comes from rigidity: the action has a schema, a name, a permission boundary, and a predictable side effect.
That rigidity is why tool calls should own atomic work. If the agent needs to query a database, send an email, read a file, or call an API, code should do it. The model can decide when the action is relevant, but the action itself should not be improvised in prose.
The failure mode is overloading. If you encode a whole workflow into one tool, every process change becomes a code change. The tool becomes hard to test, hard to review, and hard to reuse.
MCP
The Model Context Protocol gives agents a standard way to connect to external servers that provide tools, resources, and prompts. The official architecture uses a host, an MCP client, and an MCP server; the data layer uses JSON-RPC 2.0, while the transport can run over stdio or Streamable HTTP (MCP architecture overview, MCP specification).
Architecturally, an MCP tool is still a tool. It has a callable shape and returns data. MCP resources are URI-identified context objects, and MCP prompts are reusable templates served through the protocol (MCP tools spec, MCP resources spec). The difference is ownership. The implementation lives outside the agent core, often in a separate process or service. That lets a database team, internal platform team, or third-party package maintain the capability without changing the agent itself.
The failure mode is treating MCP as a reasoning layer. MCP does not decide how to solve the task. It transports capabilities and context. The model still has to choose the right call, and skills or system instructions still have to guide multi-step behavior.
Skill
Codex docs define skills as reusable workflow packages that include instructions and can include supporting scripts or references. Codex uses progressive disclosure: the model initially sees skill names and descriptions, then loads the full SKILL.md when the task calls for it (Codex skills docs, Codex glossary).
A skill is interpreted text. It does not expose a callable API by itself. It tells the model how to coordinate existing capabilities: which tools to use, what order to follow, what constraints matter, and what checks should happen before delivery. MCP prompts can also package reusable prompt templates, but they live in the protocol layer; Codex skills are local workflow packages with instructions, optional scripts, and references.
That makes skills good for work with judgment. Article rewriting, code review, incident triage, document rendering, and research workflows all have steps, exceptions, and taste. Encoding that into a markdown workflow gives you a fast way to update the process without rewriting the tools underneath it.
Plugin
Plugin is the packaging layer. Codex docs describe plugins as installable bundles that can include skills, MCP servers, apps, and other assets (Codex plugins docs, Build plugins docs). That makes plugins orthogonal to the tool-versus-skill distinction.
A plugin might ship an MCP server for a database, a skill that explains the review workflow for that database, and an app surface for humans to inspect results. The plugin does not replace tools or skills. It distributes them.
Boundary Rules
The clean boundary is simple:
| Question | Put it in |
|---|---|
| Does it perform one deterministic action? | Tool call |
| Does it expose external tools, resources, or prompt templates over a protocol? | MCP |
| Does it describe how to run a workflow? | Skill |
| Does it bundle capabilities for installation? | Plugin |
If a skill says "query the database," an actual tool must exist. If a tool tries to "handle onboarding well," it is probably hiding a workflow that belongs in a skill. If an MCP server contains broad business process logic, it may become a remote monolith instead of a capability provider.
Valid Combinations
You can build with any subset. A small internal agent may only need built-in tool calls. A writing or review assistant may need tools plus skills. A company-wide system with many integrations usually needs MCP servers so teams can own their services independently. A distributed product needs plugins to package and install the pieces.
The mature pattern is stable actions at the bottom and flexible instructions at the top. Keep tools narrow. Keep MCP servers accountable to external capability ownership. Keep skills readable enough that a human can audit the workflow. Package with plugins when installation and reuse become the problem.
My Take
Most agent architecture mistakes come from putting change in the wrong layer. Teams put process logic into code and then wonder why the agent is expensive to update. Or they put deterministic actions into instructions and then wonder why the model behaves inconsistently. The stack works when each layer absorbs the kind of change it was built for.
Takeaways
Tool calls give an agent reliable actions. MCP delivers external tools, resources, and prompt templates under a standard protocol. Skills tell the model how to coordinate actions across a workflow. Plugins package the pieces for installation and reuse. The architecture stays healthy when deterministic behavior lives in code, shared integrations live behind MCP, and evolving process knowledge lives in skills.
References
- OpenAI, Function calling guide
- OpenAI Codex docs, Skills
- OpenAI Codex docs, MCP
- OpenAI Codex docs, Glossary
- OpenAI Codex docs, Plugins
- OpenAI Codex docs, Build plugins
- OpenAI Codex docs, Customization
- Model Context Protocol specification
- Model Context Protocol architecture overview
- Model Context Protocol tools
- Model Context Protocol resources
author: Arii tag: #ai links: [[Jailbreaking LLMs]], [[Small LLMs — Use Cases and Limits]], [[Multi-Token Prediction]]
